GH-49438: [C++][Gandiva] Optimize LPAD/RPAD functions#49439
Conversation
|
|
Local Benchmark ResultsPlatform: Apple M3, macOS The original RPAD was pathologically slow compared to LPAD due to different algorithms. For 65K padding: LPAD took ~29ms while RPAD took ~992ms (34x slower for identical operation). The optimization applies the same efficient algorithm to both functions. LPAD (mean time in μs)
RPAD (mean time in μs)
|
|
|
1 similar comment
|
|
|
@lriggs @akravchukdremio @xxlaykxx You may want to review this. |
kou
left a comment
There was a problem hiding this comment.
It seems that lpad_utf8_int32_utf8() and rpad_utf8_int32_utf8() have many duplicated code. Can we factor out it as a helper function?
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 5707713. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
The
lpad_utf8_int32_utf8andrpad_utf8_int32_utf8functions have performance inefficiency and a potential memory safety issue:memsetwould suffice. Multi-byte fills use O(n) iterations instead of O(log n) with a doubling strategy.What changes are included in this PR?
std::min(fill_text_len, total_fill_bytes)for the initial copy to prevent overflowmemsetAre these changes tested?
Yes. Added tests covering:
Are there any user-facing changes?
No.